Exploring Non-Homogeneity and Dynamicity of High Scale Cloud through Hive and Pig

نویسندگان

Kashish Ara Shakil

Mansaf Alam

Shuchi Sethi

چکیده

The trace consists of cell information of about 29 days spanning across 700k jobs. This paper deals with statistical analysis of this cluster trace. Since the size of trace is very large, Hive which is a Hadoop distributed file system (HDFS) based platform for querying and analysis of Big data, has been used. Hive was accessed through its Beeswax interface. The data was imported into HDFS through HCatalog. Apart from Hive, Pig which is a scripting language and provides abstraction on top of Hadoop was used. To the best of our knowledge the analytical method adopted by us is novel and has helped in gaining several useful insights. Clustering of jobs and arrival time has been done in this paper using K-means++ clustering followed by analysis of distribution of arrival time of jobs which revealed weibull distribution while resource usage was close to zip-f like distribution and process runtimes revealed heavy tailed distribution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GENMR: Generalized Query Processing through Map Reduce In Cloud Database Management System

Big Data, Cloud computing, Cloud Database Management techniques, Data Science and many more are the fantasizing words which are the future of IT industry. For all the new techniques one common thing is that they deal with Data, not just Data but the Big Data. Users store their various kinds of data on cloud repositories. Cloud Database Management System deals with such large sets of data. For p...

متن کامل

A multi-scale convolutional neural network for automatic cloud and cloud shadow detection from Gaofen-1 images

The reconstruction of the information contaminated by cloud and cloud shadow is an important step in pre-processing of high-resolution satellite images. The cloud and cloud shadow automatic segmentation could be the first step in the process of reconstructing the information contaminated by cloud and cloud shadow. This stage is a remarkable challenge due to the relatively inefficient performanc...

متن کامل

Pig vs Hive: Benchmarking High Level Query Languages

This article presents benchmarking results of two benchmarking sets (run on small clusters of 6 and 9 nodes) applied to Hive and Pig running on Hadoop 0.14.1. The first set of results were obtainted by replicating the Apache Pig benchmark published by the Apache Foundation on 11/07/07 (which served as a baseline to compare major Pig Latin releases). The second results were obtained by applying ...

متن کامل

A Comparison of Hadoop Tools for Analyzing Tabular Data

The paper describes the application of Hadoop modules: MapReduce, Pig and Hive, for processing and analyzing large amounts of tabular data acquired from a computer simulation of heat transfer in bio tissues. The Apache Hadoop is an open source environment for storing and analyzing BigData. It was installed on a cluster of six computing nodes, each with four cores. The implemented MapReduce job ...

متن کامل

Hive Collective Intelligence for Cloud Robotics: A Hybrid Distributed Robotic Controller Design for Learning and Adaptation

The recent advent of Cloud Computing, inevitably gave rise to Cloud Robotics. Whilst the field is arguably still in its infancy, great promise is shown regarding the problem of limited computational power in Robotics. This is the most evident advantage of Cloud Robotics, but, other much more significant yet subtle advantages can now be identified. Moving away from traditional Robotics, and appr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1503.06600 شماره

صفحات -

تاریخ انتشار 2015

Exploring Non-Homogeneity and Dynamicity of High Scale Cloud through Hive and Pig

نویسندگان

چکیده

منابع مشابه

GENMR: Generalized Query Processing through Map Reduce In Cloud Database Management System

A multi-scale convolutional neural network for automatic cloud and cloud shadow detection from Gaofen-1 images

Pig vs Hive: Benchmarking High Level Query Languages

A Comparison of Hadoop Tools for Analyzing Tabular Data

Hive Collective Intelligence for Cloud Robotics: A Hybrid Distributed Robotic Controller Design for Learning and Adaptation

عنوان ژورنال:

اشتراک گذاری